Product Code Database
Example Keywords: table -intel $10-169
   » » Wiki: Book Scanning
Tag Wiki 'Book Scanning'.
Tag

Book scanning or book digitization (also: magazine scanning or magazine digitization) is the process of converting physical and into such as , , or (e-books) by using an . Large scale book scanning projects have made many books available online. Digital books can be easily distributed, reproduced, and . Image scanners may be manual or automated. After scanning, software adjusts the document images by lining it up, cropping it, picture-editing it, and converting it to text and final e-book form. Scanning resolution for book digitization varies depending on the purpose and nature of the material. High-end scanners capable of thousands of pages per hour can cost thousands of dollars. Projects like Project Gutenberg, Million Book Project, , and the Open Content Alliance scan books on a large scale. Image scanners may be manual or automated.


Description
Book scanning is the process of converting physical and into such as , , or (e-books) by using an . Large scale book scanning projects have made many books available online.


Use
Digital books can be easily distributed, reproduced, and . Common file formats are , Portable Document Format (PDF), and Tag Image File Format (TIFF). To convert the raw images optical character recognition (OCR) is used to turn book pages into a digital text format like or other similar format, which reduces the file size and allows the text to be reformatted, searched, or processed by other applications.

Image scanners may be manual or automated. In an ordinary commercial image scanner, the book is placed on a flat glass plate (or platen), and a light and optical array moves across the book underneath the glass. In manual book scanners, the glass plate extends to the edge of the scanner, making it easier to line up the book's spine.


Software
After scanning, software adjusts the document images by lining it up, cropping it, picture-editing it, and converting it to text and final e-book form. Human proofreaders usually check the output for errors.


Resolution
Scanning resolution for book digitization varies depending on the purpose and nature of the material. While () is generally adequate for text conversion, archival institutions recommend higher resolutions for preservation and rare materials. The National Archives of Australia suggests 400 ppi for bound books and 600 ppi for rare or significant documents, while the Federal Agencies Digitization Guidelines Initiative (FADGI) Federal Agencies Digitization Guidelines Initiative (FADGI) recommends a minimum of 400 ppi for archival materials.

These higher resolutions ensure the capture of fine details and support long-term preservation efforts, while a tiered approach balances quality with practical constraints such as storage capacity and resource limitations. This strategy allows institutions to optimize digitization efforts, applying higher resolutions selectively to rare or significant materials while using standard resolutions for more common documents.


Book scanners
High-end scanners capable of thousands of pages per hour can cost thousands of dollars, but (DIY), manual book scanners capable of 1,200 pages per hour have been built for US$300.


Commercial book scanners
Commercial book scanners are not like normal ; these book scanners are usually a high quality with light sources on either side of the camera mounted on some sort of frame to provide easy access for a person or machine to flip through the pages of the book. Some models involve V-shaped book cradles, which provide support for book spines and also center book position automatically.

The advantage of this type of scanner is that it is very fast, compared to the productivity of overhead scanners.


Large-scale projects
Projects like Project Gutenberg (est. 1971), Million Book Project (est. circa 2001), (est. 2004), and the Open Content Alliance (est. 2005) scan books on a large scale.

One of the main challenges to this is the sheer volume of books that must be scanned. In 2010 the total number of works appearing as books in human history was estimated to be around 130 million. All of these must be scanned and then made searchable online for the public to use as a universal library. Currently, there are three main ways that large organizations are relying on: outsourcing, scanning in-house using commercial book scanners, and scanning in-house using robotic scanning solutions.

As for outsourcing, books are often shipped to be scanned by low-cost sources to or . Alternatively, due to convenience, safety and technology improvement, many organizations choose to scan in-house by using either overhead scanners which are time-consuming, or digital camera-based scanning machines which are substantially faster and is a method employed by Internet Archive as well as Google. Traditional methods have included cutting off the book's spine and scanning the pages in a with automatic page-feeding capability, with subsequent rebinding of the loose pages.

Once the page is scanned, the is either entered manually or via OCR, another major cost of the book scanning projects.

Due to issues, most scanned books are those that are out of copyright; however, Google Books is known to scan books still protected under copyright unless the specifically prohibits this.


Collaborative projects
There are many collaborative digitization projects throughout the United States. Two of the earliest projects were the Collaborative Digitization Project in Colorado and – North Carolina Exploring Cultural Heritage Online, based at the State Library of North Carolina.

These projects establish and publish best practices for digitization and work with regional partners to digitize cultural heritage materials. Additional criteria for best practices have more recently been established in the UK, Australia and the European Union. Wisconsin Heritage Online is a collaborative digitization project modeled after the Colorado Collaborative Digitization Project. Wisconsin uses a to build and distribute collaborative documentation. Georgia's collaborative digitization program, the Digital Library of Georgia, presents a seamless virtual library on the state's history and life, including more than a hundred digital collections from 60 institutions and 100 agencies of government. The Digital Library of Georgia is a GALILEO initiative based at the University of Georgia Libraries.

In the twentieth century, the Hill Museum and Manuscript Library photographed books in Ethiopia that were subsequently destroyed amidst political violence in 1975. The library has since worked to photograph manuscripts in Middle Eastern countries.

In South Asia, the Nanakshahi trust is digitizing manuscripts of Gurmukhī script.

In Australia, there have been many collaborative projects between the National Library of Australia and universities to improve the repository infrastructure that digitized information would be stored in.Libraries in the twenty-first century: Charting new directions in information services. Edited by Stuart Ferguson, 2007, pg 84 Some of these projects include, the ARROW (Australian Research Repositories Online to the World) project and the APSR (Australian Partnership for Sustainable Repository) project.


Methods
Image scanners may be manual or automated. In an ordinary commercial image scanner, the book is placed on a flat glass plate (or platen), and a light and optical array moves across the book underneath the glass. In manual book scanners, the glass plate extends to the edge of the scanner, making it easier to line up the book's spine.


Scanning preparation
A problem with scanning bound books is that when a book that is not very thin is laid flat, the part of the page close to the spine (the gutter) is significantly curved, distorting the text in that part of the scan. One solution is to separate the book into separate pages by cutting or unbinding. A non-destructive method is to hold the book in a V-shaped holder and photograph it, rather than lay it flat and scan it. The curvature in the gutter is much less pronounced this way. Pages may be turned by hand or by automated paper transport devices. Transparent plastic or glass sheets are usually pressed against the page to flatten it.


Destructive scanning methods
For book scanning on a low budget, the least expensive way to scan a book or magazine is to cut off the binding. This converts the book or magazine into a sheaf of separate sheets which can be loaded into a standard automatic document feeder (ADF) and scanned using inexpensive and common scanning technology. The method is not suitable for rare or valuable books. There are two technical difficulties with this process, first with the cutting and second with the scanning.


Unbinding
More precise and less destructive than cutting pages is to unbind by hand using suitable tools. This technique has been successfully employed for tens of thousands of pages of archival original paper scanned for the Riazanov Library digital archive project from newspapers and magazines and pamphlets, varying from 50 to 100 years old and more, and often composed of fragile, brittle paper. Although the monetary value for some collectors (and for most sellers of this sort of material) is destroyed by unbinding, it in many cases actually greatly assists preservation of the pages, making them more accessible to researchers and less likely to be damaged when subsequently examined. A disadvantage is that unbound stacks of pages are "fluffed up", and therefore more exposed to oxygen in the air, which may in some cases speed deterioration. This can be addressed by putting weights on the pages after they are unbound, and storage in appropriate containers.


Robotic book scanners
A robotic or automated book scanner is a device that digitizes printed books by using robotic systems to turn pages and capture images of each page without the need for human hands to touch the book. The scanner consists of a mechanism to automatically turn pages, one or more cameras to photograph each page, and software to compile these images into a digital file. These scanners are used to digitize large quantities of books quickly. Some models allow for manual operation if a book is too delicate or complex for the robot to handle alone. The process is designed to be gentle on books, often using special cradles and glass plates to avoid damage during scanning.Sinmaz, E. K., Kocaseçer, M., & Ayyildiz, M. (2022). The Effect of Book Preconditioning on Page-Turning Success Rate during Automated Book Digitization. Instruments & Experimental Techniques, 65(5), 826–833. https://doi.org/10.1134/S0020441222050281

Most high-end commercial robotic scanners use air and technology to turn and separate pages. These scanners utilize a vacuum or air suction to gently lift a page from the stack, while a puff of air is used to turn the page over, allowing the device to scan both sides efficiently. Some use newer approaches such as bionic fingers for turning pages. Some scanners take advantage of ultrasonic or photoelectric sensors to detect dual pages and prevent skipping of pages. With reports of machines being able to scan up to 2,900 pages per hour, robotic book scanners are specifically designed for large-scale digitization projects.

Google's patent 7508978 shows an camera technology which allows detection and automatic adjustment of the three-dimensional shape of the page. The Secret Of Google's Book Scanning Machine Revealed, by Maureen Clements, April 30, 2009. Robotic book scanners that use air and suction technology rely on specialized systems to turn and separate pages without causing damage to fragile or rare books. These scanners utilize a vacuum or air suction to gently lift a page from the stack, while a puff of air is used to turn the page over, allowing the device to scan both sides efficiently


See also


External links

Page 1 of 1
1
Page 1 of 1
1

Account

Social:
Pages:  ..   .. 
Items:  .. 

Navigation

General: Atom Feed Atom Feed  .. 
Help:  ..   .. 
Category:  ..   .. 
Media:  ..   .. 
Posts:  ..   ..   .. 

Statistics

Page:  .. 
Summary:  .. 
1 Tags
10/10 Page Rank
5 Page Refs
2s Time